Unroll the integer-part digit scan (straight-line for the common 1-5 digit case)#381
Merged
Merged
Conversation
…digit case) parse_number_string scans the integer part one byte at a time in a while loop, while the fraction already uses the 8-digit SWAR loop. Most integer parts are 1-5 digits, so the loop back-edge dominates. Peel the first five iterations into nested ifs, falling through to the original while for longer runs. Semantics are identical (i = 10*i + digit, advancing p); no behavior change. AWS m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark, from_chars->double. base vs patch measured back-to-back, mean of 2 runs: canada: gcc +3.1%, clang +2.8% mesh: gcc +5.4%, clang +5.1% random: ~flat (1-digit integer part) No regression; gcc and clang agree. Alternatives benchmarked and rejected: reusing loop_parse_if_eight_digits for the integer part regressed 5-8% (integer parts are too short for 8-digit SWAR setup); a counted for(k<5) loop matched on gcc but clang optimized it worse (canada -0.9%). The explicit peel is the only form solidly positive on both compilers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The integer part of a number is scanned one byte at a time, while the fractional
part already uses the 8-digit SWAR loop (
loop_parse_if_eight_digits). Integer partsare usually short (1–5 digits), so the loop back-edge is a large share of the cost.
This peels the first five iterations into straight-line
ifs and falls through to theoriginal loop for longer inputs. The arithmetic is unchanged (
i = 10*i + digit), sobehavior is identical; one file, +29/−6, in the
UC-templated path.Benchmark —
m8g.metal-24xl(Graviton4),-O3 -march=native,simple_fastfloat_benchmark,from_chars→double, base vs patch measuredback-to-back (mean of 2 runs):
random is
0.xxx(a 1-digit integer part), so it is unaffected, as expected. Noregression on any input.
For completeness I also tried reusing
loop_parse_if_eight_digitsfor the integerpart, and a counted
for (k < 5)loop; both were slower here (the 8-digit SWAR setupdoes not pay off for short integer parts, and clang optimized the counted loop less
well), so this keeps the explicit peel.
Tests:
FASTFLOAT_TEST14/14 andFASTFLOAT_EXHAUSTIVE(exhaustive32 / 32_64 /midpoint / long variants) all pass. Builds clean on gcc and clang at C++11 and C++20
under
-Werror -Wall -Wextra -Weffc++ -Wconversion -Wsign-conversion -Wshadow,clang-format clean. No new multi-byte reads, so big-endian (s390x) is unaffected.